Statistical and Linguistic Clustering for Language Modeling in ASR

نویسندگان

  • Raquel Justo
  • M. Inés Torres
چکیده

In this work several sets of categories obtained by a statistical clustering algorithm, as well as a linguistic set, were used to design category-based language models. The language models proposed were evaluated, as usual, in terms of perplexity of the text corpus. Then they were integrated into an ASR system and also evaluated in terms of system performance. It can be seen that category-based language models can perform better, also in terms of WER, when categories are obtained through statistical models instead of using linguistic techniques. They also show that better system performance are obtained when the language model interpolates category based and word based models.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Exploiting Linguistic Knowledge in Language Modeling of Czech Spontaneous Speech

In our paper, we present a method for incorporating available linguistic information into a statistical language model that is used in ASR system for transcribing spontaneous speech. We employ the class-based language model paradigm and use the morphological tags as the basis for world-to-class mapping. Since the number of different tags is at least by one order of magnitude lower than the numb...

متن کامل

Practical and Efficient Incorporation of Syntactic Features into Statistical Language Models

Automatic Speech Recognition (ASR) and Statistical Machine Translation (SMT), among other natural language processing applications, rely on a language model (LM) to provide a strong linguistic prior over word sequences of the often prohibitively large and complex hypothesis space of these systems. The language models deployed in most state-of-the-art ASR and SMT systems are n-gram models. Sever...

متن کامل

Statistical Method of Building Dialect Language Models for ASR Systems

This paper develops a new statistical method of building language models (LMs) of Japanese dialects for automatic speech recognition (ASR). One possible application is to recognize a variety of utterances in our daily lives. The most crucial problem in training language models for dialects is the shortage of linguistic corpora in dialects. Our solution is to transform linguistic corpora into di...

متن کامل

Unsupervised Clustering and Automatic Language Model Generation for ASR

The goal of an automatic speech recognition system is to enable the computer in understanding human speech and act accordingly. In order to realize this goal, language modeling plays an important role. It works as a knowledge source through mimicking human comprehension mechanism in understanding the language. Among many other approaches, statistical language modeling technique is widely used i...

متن کامل

Multilingual speech recognition A posterior based approach

Modern automatic speech recognition (ASR) systems are based on parametric statistical models such as hidden Markov models (HMMs), exploiting 1) acoustic-phonetic models, which need to be trained on large amount of acoustic data, 2) a language model, which needs to be trained on large amount of text data and, finally, 3) a lexicon with phonetic transcription which requires linguistic expertise. ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2005